Loading in the met dataset and reading it. I have already previously downloaded the dataset to my github repo (.gitignore) and will thus not be downloading it again.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
met <-read.csv("data/met_all.gz")
Data Pre-Prep
Things to do:
Remove temperatures less than -17C
Make sure there is no missing data in the key variables coded as 9999, 999, etc.
Generate a date variable using the functions as.Date() (hint: You will need the following to create a date paste(year, month, day, sep = “-”)).
Subset the data to keep only the observations from the first week (ie. first 7 days) of the month.
Compute the mean by station of the variables temp, rh, wind.sp, vis.dist, dew.point, lat, lon, and elev.
Create a region variable for NW, SW, NE, SE based on lon = and lat = degrees
Create a categorical variable for elevation as in the lecture slides
The violin plots reveal distinct regional patterns in wind speed distributions. The northeast shows a symmetric distribution, with most values clustered around 3 m/s, while the northwest displays greater variability. The southeast exhibits less variability, whereas the southwest demonstrates wind speeds with the greatest variation compared to the other regions. Overall, the northeast region appears to have the highest wind speeds, while the southwest region shows the most concentrated distribution around 3 m/s.
Use geom_jitter with stat _smooth to examine the association between dew point and wind speed by region
Deal with NAs
Color points by region
Fit Linear Regression by Region
Plot Description:
The scatter plot illustrates the relationship between dew point and relative humidity across all regions. All regional regression lines exhibit positive slopes, indicating that relative humidity increases as the dew point rises. The relationship appears strongest in the Northwest and Northeast regions, while the Southeast and Southwest show a more moderate association.
# Dew Point and Relative Humiditymet_avg %>%filter(!(dew.point %in%NA)) %>%ggplot(mapping =aes(x = dew.point, y = rh, colour = region)) +geom_jitter() +stat_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'
#Dew Point and Wind Speedmet_avg %>%filter(!(dew.point %in%NA)) %>%ggplot(mapping =aes(x = dew.point, y = wind.sp, colour = region)) +geom_jitter() +stat_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'
#Removing NA values using !is.na for both plots met_avg %>%filter(!is.na(dew.point) &!is.na(rh)) %>%ggplot(mapping =aes(x = dew.point, y = rh, colour = region)) +geom_jitter() +stat_smooth(method = lm)
Use geom_bar to create barplots of the weather stations by elevation category colored by region
Deal with NAs
Plot bars by category “elevation” using position = “dodge”
Change colors from the default color by region using scale_fill_brewer
Add labels and titles
Plot Description:
The bar chart highlights differences in weather station distribution across regions and elevation categories. The Southeast region contains highest number of stations overall, most of which are at low elevations. The Northeast region has greater number of stations at higher elevations. The Northwest region has fewest stations in total but shows a high proportion at elevated locations. The Southwest region falls in between, with moderate number of stations, many are also located at higher elevations.
met_avg %>%filter(!(elev_cat %in%NA)) %>%ggplot() +geom_bar(mapping =aes(x = elev_cat, fill = region), position ="dodge") +scale_fill_brewer(palette ="PuOr") +labs(title ="Number of Weather Station sby Elevation Category and Region", x ="Elevation Category", y ="Count") +theme_bw()
Use stat_summary to examine mean dew point and wind speed by region with standard deviation and error bars.
Plot Description:
Southeast region exhibits the highest mean dew point, while Northeast has the lowest. The error bars indicate that Southwest experiences the greatest variability, reflecting wider range of dew point values compared to other regions. When comparing wind speed patterns with dew point distributions, the Northwest and Southwest regions show higher mean wind speeds, around 3 m/s, suggesting greater variability in wind conditions. In contrast, the Northeast and Southeast regions display lower mean wind speeds with less variability.
# Dew Point by Regionmet_avg %>%filter(!is.na(dew.point) &!is.na(region)) %>%ggplot(mapping =aes(x = region, y = dew.point)) +stat_summary(fun.data ="mean_sdl", geom ="errorbar") +stat_summary(fun.data ="mean_sdl")
# Wind Speed by Regionmet_avg %>%filter(!is.na(wind.sp) &!is.na(region)) %>%ggplot(mapping =aes(x = region, y = wind.sp)) +stat_summary(fun.data ="mean_sdl", geom ="errorbar") +stat_summary(fun.data ="mean_sdl")
Make a map showing the spatial trend in relative humidity in the US
Plot Description:
Relative humidity map reveals distinct east–west gradient across the United States. The eastern regions have higher relative humidity values while the central regions show moderate levels and the western regions exhibit lower values. The ten locations with the highest relative humidity are concentrated primarily in the eastern and southeastern United States.
library(leaflet)met_avg2 <- met_avg[!is.na(met_avg$rh), ]top10 <- met_avg2[rank(-met_avg$rh) <=10, ]rh_pal =colorNumeric(c('blue', 'purple', 'red'), domain = met_avg2$rh)leaflet(met_avg2) %>%addProviderTiles('OpenStreetMap') %>%addCircles(lat =~ lat, lng =~ lon, color =~rh_pal(rh), label =~paste0(round(rh, 2), 'rh'), opacity =1, fillOpacity =1, radius =500) %>%addMarkers(lat =~ lat, lng =~ lon, label =~paste0(round(rh, 2), 'rh'), data = top10) %>%addLegend('bottomleft', pal = rh_pal, values = met_avg2$rh, title ="Relative Humidity", opacity =1)
Use a ggplot extension to make a plot of our choosing
I would like to analyse the atmospheric pressure variations by region, and I shall use the ridgeline extension
library(ggridges)met_avg %>%filter(!(temp %in%NA)) %>%ggplot(mapping =aes(x = atm.press, y = region, fill = region)) +geom_density_ridges(alpha =0.7) +facet_wrap(~elev_cat, nrow =2) +scale_fill_brewer(palette ="Set2") +labs(title ="Atmospheric Pressure Distributions by Region",x ="Atmospheric Pressure", y ="Region", fill ="Region") +theme_bw() +theme(legend.position ="bottom")
Picking joint bandwidth of 0.496
Picking joint bandwidth of 0.795
Description:
These ridge plots show regional differences in atmospheric pressure under high and low conditions. The Northeast and Northwest cluster around higher pressures, while the Southeast and Southwest display more variable distributions. The Southwest,peaks lower under low pressure conditions, showing stronger variability compared to other regions.